Estimating Frequency Moments of Streams
ثبت نشده
چکیده
We will develop algorithms that can approximate Fk by making one pass of the stream and using a small amount of memory o(n+m). Frequency moments have a number of applications. F0 represents the number of distinct elements in the streams (which the FM-sketch from last class estimates using O(log n) space. F1 is the number of elements in the stream m. F2 is used in database optimization engines to estimate self join size. Consider the query, “return all pairs of individuals that are in the same location”. Such a query has cardinality equal to ∑ im 2 i /2, where mi is the number of individuals at a location. Depending on the estimated size of the query, the database can decide (without actually evaluating the answer) which query answering strategy is best suited. F2 is also used to measure the information in a stream. In general, Fk represents the degree of skew in the data. If Fk/F0 is large, then there are some values in the domain that repeat more frequently than the rest. Estimating the skew in the data also helps when deciding how to partition data in a distributed system.
منابع مشابه
Better Bounds for Frequency Moments in Random-Order Streams
Estimating frequency moments of data streams is a very well studied problem [1–3,9,12] and tight bounds are known on the amount of space that is necessary and sufficient when the stream is adversarially ordered. Recently, motivated by various practical considerations and applications in learning and statistics, there has been growing interest into studying streams that are randomly ordered [3,4...
متن کاملEstimating Entropy of Data Streams Using Compressed Counting
The Shannon entropy is a widely used summary statistic, for example, network traffic measurement, anomaly detection, neural computations, spike trains, etc. This study focuses on estimating Shannon entropy of data streams. It is known that Shannon entropy can be approximated by Rényi entropy or Tsallis entropy, which are both functions of the αth frequency moments and approach Shannon entropy a...
متن کاملEstimating Hybrid Frequency Moments of Data Streams
We consider the problem of estimating hybrid frequency moments of two dimensional data streams. In this model, data is viewed to be organized in a matrix form (Ai,j)1≤i,j,≤n. The entries Ai,j are updated coordinate-wise, in arbitrary order and possibly multiple times. The updates include both increments and decrements to the current value of Ai,j . The hybrid frequency moment Fp,q(A) is defined...
متن کاملA Very Efficient Scheme for Estimating Entropy of Data Streams Using Compressed Counting
Compressed Counting (CC) was recently proposed for approximating the αth frequency moments of data streams, for 0 < α ≤ 2. Under the relaxed strict-Turnstile model, CC dramatically improves the standard algorithm based on symmetric stable random projections, especially as α → 1. A direct application of CC is to estimate the entropy, which is an important summary statistic in Web/network measure...
متن کاملInstructor : Chandra Chekuri Scribe : Chandra Chekuri 1 Estimating Frequency Moments in Streams
A significant fraction of streaming literature is on the problem of estimating frequency moments. Let σ = a1, a2, . . . , am be a stream of numbers where for each i, ai is an intger between 1 and n. We will try to stick to the notation of using m for the length of the stream and n for range of the integers1. Let fi be the number of occurences (or frequency) of integer i in the stream. We let f ...
متن کاملImproving Compressed Counting
Compressed Counting (CC) [22] was recently proposed for estimating the αth frequency moments of data streams, where 0 < α ≤ 2. CC can be used for estimating Shannon entropy, which can be approximated by certain functions of the αth frequency moments as α → 1. Monitoring Shannon entropy for anomaly detection (e.g., DDoS attacks) in large networks is an important task. This paper presents a new a...
متن کامل